Long-Term F0 Modeling for Text-Independent Speaker Recognition
نویسندگان
چکیده
Long-term F0 modeling for text-independent speaker recognition is considered using both parametric and nonparametric approaches. In the parametric case, mean, variance, skewness, and kurtosis are computed and the parameter vectors are compared using weighted Euclidean distance. In the nonparametric case, F0 distribution is represented by a histogram, and KullbackLeibler distance is used in addition to the Euclidean distance. F0 models are combined with a spectral classifier based on MFCC coefficients, and the results on a subset of the NIST 1999 corpus indicate that F0 provides useful additional information, especially for improving verification accuracy in noisy and mismatched training/matching conditions.
منابع مشابه
Prosodic features based on wavelet analysis for speaker verification
Most conventional speaker recognition systems rely on short-term spectral information. But they ignore the long-term information such as prosody which also conveys speaker information. In this paper, we propose an approach that extracts prosodic features based on long-term information. First, by making wavelet analysis, we can reveal the trends of the f0 and energy contour. Subsequently, the pr...
متن کاملPitch-dependent GMMs for text-independent speaker recognition systems
Gaussian mixture models (GMMs) and ergodic hidden Markov models (HMMs) have been successfully applied to model short-term acoustic vectors for speaker recognition systems. Prosodic features are known to carry information concerning the speaker’s identity and they can be combined with the short-term acoustic vectors in order to increase the performance of the speaker recognition system. In this ...
متن کاملModeling dynamic prosodic variation for speaker verification
Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual’s speaking style. In this work, we take a first step toward capturing such ...
متن کاملBeyond the long-term mean: exploring the potential of F0 distribution parameters in traditional forensic speaker recognition
Despite its many prima facie attractive properties for Forensic Speaker Recognition, F0 is regarded as having limited forensic value due to its large within-speaker variability. However, its forensic use to date has been limited mostly to its long-term mean and standard deviation. This paper examines the discriminatory potential, within a Likelihood Ratio-based approach, of additional parametri...
متن کاملLR estimation using long term F0 as a parameter: good, bad or useless? Initial investigation using Japanese data
This paper investigates the validity of LR estimation for long-term F0 using Aitkin (1995)’s formula. Although this formula has been developed to estimate the LR of reflective index of glass fragments, previous studies such as Kinoshita (2001) and Rose, Osanai, and Kinoshita (2003) have shown that Aitkin’s formula can be applied to speech data. The experiments in this study revealed, however, t...
متن کامل